Search CORE

98 research outputs found

Pratique de l'heuristique de pente et le package CAPUSHE

Author: Baudry Jean-Patrick
Maugis Cathy
Michel Bertrand
Publication venue: HAL CCSD
Publication date: 31/08/2010
Field of study

National audienceLa mise en oeuvre des méthodes "data-driven" de calibration de critères pénalisés, issues de l'heuristique de pente de Birgé et Massart (2007), implique des difficultés pratiques

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL Descartes

HAL-INSA Toulouse

Hal-Diderot

Slope Heuristics: Overview and Implementation

Author: Baudry Jean-Patrick
Maugis Cathy
Michel Bertrand
Publication venue: HAL CCSD
Publication date: 05/03/2010
Field of study

RR INRIA-7223, Version 1Model selection is a general paradigm which includes many statistical problems. One of the most fruitful and popular approaches to carry it out is the minimization of a penalized criterion. Birgé and Massart (2006) have proposed a promising data-driven method to calibrate such criteria whose penalties are known up to a multiplicative factor: the ``slope heuristics''. Theoretical works validate this heuristic method in some situations and several papers report a promising practical behavior in various frameworks. The purpose of this work is twofold. First, an introduction to the slope heuristics and an overview of the theoretical and practical results about it are presented. Second, we focus on the practical difficulties occurring for applying the slope heuristics. A new practical approach is carried out and compared to the standard dimension jump method. All the practical solutions discussed in this paper in different frameworks are implemented and brought together in a Matlab graphical user interface called capushe

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL-INSA Toulouse

On the significance of a linear relationship in density estimation with mixture models

Author: Baudry Jean-Patrick
Publication venue
Publication date: 01/01/2019
Field of study

Numérisation de Documents Anciens Mathématiques

Sélection de modèle pour la classification non supervisée

Author: Baudry Jean-Patrick
Publication venue: HAL CCSD
Publication date: 31/08/2010
Field of study

National audienceNous rappelons les bases de l'approche de la classification non supervisée par les modèles de mélange

INRIA a CCSD electronic archive server

Hal-Diderot

Enhancing the selection of a model-based clustering with external qualitative variables

Author: Amorim Maria José
Baudry Jean-Patrick
Cardoso Margarida
Celeux Gilles
Ferreira Ana Sousa
Publication venue
Publication date: 31/10/2012
Field of study

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a model and a number of clusters which both fit the data well and take advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Consistent Regression using Data-Dependent Coverings

Author: Baudry Jean-Patrick
Guilloux Frédéric
Margot Vincent
Wintenberger Olivier
Publication venue: HAL CCSD
Publication date: 22/10/2019
Field of study

In this paper, we introduce a novel method to generate interpretable regression function estimators. The idea is based on called data-dependent coverings. The aim is to extract from the data a covering of the feature space instead of a partition. The estimator predicts the empirical conditional expectation over the cells of the partitions generated from the coverings. Thus, such estimator has the same form as those issued from data-dependent partitioning algorithms. We give sufficient conditions to ensure the consistency, avoiding the sufficient condition of shrinkage of the cells that appears in the former literature. Doing so, we reduce the number of covering elements. We show that such coverings are interpretable and each element of the covering is tagged as significant or insignificant. The proof of the consistency is based on a control of the error of the empirical estimation of conditional expectations which is interesting on its own

Sélection de modèle pour la classification en présence d'une classification externe

Author: Baudry Jean-Patrick
Celeux Gilles
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

International audienceEn classification non supervisée de données, il est souvent utile d'interpréter les résultats de la classification cherchée en regard d'une partition des individus connue a priori et obtenue sur d'autres informations que les données disponibles. Nous proposons une approche fondée sur le modèle de mélange de lois qui permet de sélectionner un modèle de classification et un nombre de classes de sorte à produire une classification qui, à la fois, s'ajuste bien aux données et présente une bonne liaison avec la partition a priori. Cette approche utilise la vraisemblance intégrée jointe des données et des deux classification en jeu. Il est à noter que l'obtention de la classification ne fait intervenir la partition a priori que dans la phase de sélection d'un modèle et non dans la phase de construction de la classification qui se fait de manière classique par maximum de vraisemblance. Des illustrations seront données et le fait de dissocier les étapes d'estimation et de sélection d'un modèle sera discuté

INRIA a CCSD electronic archive server

Hal-Diderot

Capushe : package de sélection de modèle

Author: Baudry Jean-Patrick
Brault Vincent
Maugis-Rabusseau Cathy
Michel Bertrand
Publication venue: HAL CCSD
Publication date: 02/07/2012
Field of study

Capushe : package de sélection de modèl

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Hal-Diderot

Combining Mixture Components for Clustering

Author: Baudry Jean-Patrick
Celeux Gilles
E. Raftery Adrian
Gottardo Raphael
Lo Kenneth
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2010
Field of study

International audienceModel-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or equal to K; these clusterings can be compared on substantive grounds. We illustrate the method with simulated data and a flow cytometry dataset

INRIA a CCSD electronic archive server

PubMed Central